The goal of this study is to retrospectively determine the factors that influenced the spatiotemporal spread of COVID-19 throughout the United States during the first wave of the pandemic. Specifically, we aim to explain the role of county-level attributes and county-county mobility patterns on the spread of COVID-19. Additionally, the model can aid in predicting future spatial spread in the United States in the event of regional containment.
Our approach involves fitting a stochastic model that predicts the rate of COVID-19 importation into new counties in the United States. The model is updated daily from March 1, 2020 to August 3, 2020. The time of infection for each county is based on COVID-19 case data reported at the county level by the New York Times that are based on reports from state and local health agencies. The probabilities of COVID-19 importation to all potential receiving counties, from all potential transmitting counties \(i\) (sources) are modeled using a logit parameterization, with the probabilities of infection from counties \(i\) to \(j\) (\(p_{ij}\)) at each time step defined by a generalized gravity model.
County level attributes that we considered include population size, COVID-19 cases, and non-pharmaceutical interventions in place (mandates, stay at home orders, gathering size limits, and bar closures). County-county mobility patterns and connections that we considered include residence-workplace commuting flows information from the American Community Survey (ACS), estimated daily domestic flight passenger volume based on OAG data from 2019 and from the pandemic period (March 2020-July 2020), and the Facebook Social Connectedness Index (SCI).
The results in this summary are from the best fit model which estimates the probability of COVID-19 importation based on county population sizes (\(mass_{ij}=pop_i*pop_j\)), distances (\(dist_{ij}\)) between counties, total COVID-19 cases reported in the previous 10 days in the source county \(i\) (\(cases_{ij}\)), the estimated number of commuters between counties \(i\) and \(j\) based on the ACS (\(acs_{ij}\)), the estimated number of daily flight passengers traveling between counties \(i\) and \(j\) (\(flight_{ij}\)) from March 2020-July 2020, and four non-pharmaceutical interventions in place in counties \(i\) (\(bars_i\), \(sah_i\), \(mask_i\), \(gather_i\)).
\[\begin{equation} p_{ij} = \frac{1}{1 + e^{\beta_0 + \frac{mass^{\beta_{2}}_{ij} cases^{\beta_3}_i}{dist^{\beta1}_{ij}} + \beta_4log(acs_{ij} + 1) + \beta_5log(flights_{ij}) + \beta_6bars_i + \beta_7sah_i + \beta_8mask_i + \beta_9gather_i}} \;\; (Eq. 1) \end{equation}\]
Models are fit using maximum likelihood estimation and the best model is selected using AIC.
Table 1 contains the parameter estimates for the model specified by Eq. (1), which estimates that infection probability increases with population size and decreases with distance between counties. Higher numbers of COVID-19 cases in the source is also associated with higher infection probability. Counties with higher commuting and domestic flight passenger flows between them also have higher risk of COVID-19 transmission. All four interventions for which we have data are associated with a lower probability of COVID-19 spread from infected counties to uninfected counties. Table 2 shows AIC values for iterated model fits.
| model | beta0 | beta1 | beta2 | beta3 | beta4 | beta5 | beta6 | beta7 | beta8 | beta9 |
|---|---|---|---|---|---|---|---|---|---|---|
| Model1 | 8.03 | -1.77 | -1 | -0.7 | -0.47 | -0.16 | -1.15 | 0.94 | 1.2 | 0.09 |
read.csv(find_file("model_output/final_tab.csv")) %>%
rename(NLL=nll, AIC=aic, Model=model) %>%
mutate(Model = c('flights, acs, interventions',
'interventions',
'log flights, acs',
'flights, acs',
'flights',
'acs',
'cases*',
'cases+',
'basic')) %>%
kable(row.names = NA, caption = "Table 2. Model comparison") %>%
kable_styling(full_width = F)
| Model | NLL | AIC |
|---|---|---|
| flights, acs, interventions | 11305.71 | 22631.42 |
| interventions | 11500.92 | 23017.83 |
| log flights, acs | 11431.10 | 22874.20 |
| flights, acs | 11591.43 | 23194.86 |
| flights | 11597.29 | 23204.58 |
| acs | 11606.33 | 23222.66 |
| cases* | 11614.79 | 23237.59 |
| cases+ | 11788.46 | 23584.92 |
| basic | 11864.39 | 23734.79 |
We also use the fitted model to compare the effectiveness of different interventions in reducing the probability of spread of SARS-CoV-2 from transmitting counties to receiving counties. As expected, the probability of importing a first case robustly declines after a stay-at-home order is imposed in either a transmitting or receiving county. Suprisingly, the effect of mask orders in transmitting counties is nearly as strong. We also find evidence that gathering bans are more effective at preventing export than import of SARS-CoV-2. This is likely because gathering bans in transmitting counties discourage people from leaving to host or join events in receiving counties regardless of whether the receiving county has a ban in place. Bar closures in transmitting counties are associated with a lower probability of transmission when they are enacted before the transmitting county receives its first case. We also find some weak evidence that cooperation in closing bars is important. The probability of transmission is lower when transmitting and receiving counties both enact bar closures than when only one or the other county does. Most importantly, these fitted models show that there are multiple dimensions of precaution that are important. There is not a single axis of precautious behavior that these interventions are proxying for; each government action appears to be independently important.
The following map shows the model-predicted probability of reporting the first case in the next period. Probabilities change over time as underlying conditions, such as the number of cases in neighboring counties, change. Use the slider to show probabilities for a different day. Counties turn gray once they report their first case.
According to the best fit model, the number of commuters between counties is positively associated with higher COVID-19 transmission. The commuting flows are based on estimates from the 2011-2015 ACS commuting survey from the US Census. These connections are predominantly short-distance commutes between cities and their surrounding suburbs, but also notably contain long-distance commuting flows.
source(find_file("summary/line_map_visualization.R"))
fun_line_map(outbreak_data, flow_data = ACS_Flows, upper_quantile = 0.999)
Fig. 3: These lines connect counties that account for the top 0.1% (n = 4944) of the strongest pairwise county-county commuting flows
The volume of county-county domestic flight passengers is also associated positively associated with higher COVID-19 transmission. Data on domestic flight passenger volume are from the Official Airline Guide (OAG), which is available as monthly passenger totals for each flight path. Passengers were allocated to counties in catchment areas surrounding airports, with proportion of passengers allocated to counties based on the county’s population and distance to an airport, with a lower proportion of passengers allotted to counties as the radius increased from the airports and as the population decreased.
We originally fit the model to a static data set of the mean of 2019 flight passenger volume, which was an improvement when compared to a model without any flight data. However, a model that was fit with a time-varying data set of flight volume from the period of the pandemic (March 2020-July 2020) outperformed the model with the 2019 data only. This provided evidence that the relative county-county flight passenger flows varied throughout the pandemic months, and have had unequal changes in their passenger volume compared to baseline. This is evident by the data in Fig. 4, with some paths returning to close to the 2019 baseline quickly whereas others remain far below baseline even in July (e.g. flight volume to Hawaii counties, New York City counties, etc.). However, since the data on passenger volume is not available in real-time, we suggest that if the model is to be used for predicting COVID-19 transmission risk in the future, using historic flight volume averages is still beneficial.
fun_perc_line_map_buttons(outbreak_data, flow_data = Flight_Flow, baseline = Flight_flow_mean_2019, upper_quantile = .9999)
Fig. 4: The lines connect counties that account for the top 0.01% strongest pairwise connections from 2019 (n = 494), and are colored by the each month’s volume as a % of the 2019 baseline.
The model is a closed system with US counties as the only potential sources of transmission. This limitation is why we start our analysis on a date where local transmission was the primary source of infection in the United States. We chose March 1, 2020 as the start date for the analysis, since widespread local transmission was estimated to be occurring in the US from late February onward1, and testing criteria was expanded on March 4 to include individuals without international travel history2. However, testing availability was limited for some time, and varied geographically in the US. Thus, the data of infection times are likely to be biased to later than the true infection times.
Davis JT, Chinazzi M, Perra N, et al. Estimating the establishment of local transmission and the cryptic phase of the COVID-19 pandemic in the USA. Preprint. medRxiv. 2020;2020.07.06.20140285. Published 2020 Jul 7. doi:10.1101/2020.07.06.20140285↩︎
CDC, “Updated Guidance on Evaluating and Testing Persons for Coronavirus Disease 2019 (COVID-19)”; https://emergency.cdc.gov/han/2020/han00429.asp.↩︎